Inventi Impact: Audio, Speech & Music Processing

Articles

Inventi:easm/114450/26

Two-Stage Domain Adaptation for LLM-Based ASR by Decoupling Linguistic and Acoustic Factors

01-Apr-2026 Research 2026 : April-June

Lin Zheng, Xuyang Wang, Qingwei Zhao, Ta Li

Large language models (LLMs) have been increasingly applied in Automatic Speech Recognition (ASR), achieving significant advancements. However, the performance of LLM-based ASR (LLM-ASR) models remains unsatisfactory when applied across domains due to domain shifts between acoustic and linguistic conditions. To address this challenge, we propose a decoupled two-stage domain adaptation framework that separates the adaptation process into text-only and audio-only stages. In the first stage, we leverage abundant text data from the target domain to refine the LLM component, thereby improving its contextual and linguistic alignment with the target domain. In the second stage, we employ a pseudo-labeling method with unlabeled audio data in the target domain and introduce two key enhancements: (1) incorporating decoupled auxiliary Connectionist Temporal Classification (CTC) loss to improve the robustness of the speech encoder under different acoustic conditions; (2) adopting a synchronous LLM tuning strategy, allowing the LLM to continuously learn linguistic alignment from pseudo-labeled transcriptions enriched with domain textual knowledge. The experimental results demonstrate that our proposed methods significantly improve the performance of LLM-ASR in the target domain, achieving a relative word error rate reduction of 19.2%.

How to Cite this Article
Attribution/ CC Compliant Citation: Zheng, L.; Wang, X.; Zhao, Q.; Li, T. Two-Stage Domain Adaptation for LLM-Based ASR by Decoupling Linguistic and Acoustic Factors. Appl. Sci. 2026, 16, 60. https://doi.org/10.3390/app16010060 http://creativecommons.org/licenses/by/4.0/ Some formatting elements, header, footer, logos, dates and pagination were modified while adapting this article.
Download Full Text

Call Us: +4 (800) 888-0008

Inventi Impact: Audio, Speech & Music Processing

Articles

Inventi:easm/114450/26

Two-Stage Domain Adaptation for LLM-Based ASR by Decoupling Linguistic and Acoustic Factors

How to Cite this Article

Links

Contact Us